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The web of relations linking technological innovation can be fairly described in terms of patent 
citations. The resulting patent citation network provides a picture of the large-scale organization 
of innovations and its time evolution. Here we study the patterns of change of patents registered 
by the US Patent and Trademark Office (USPTO). We show that the scaling behavior exhibited 
by this network is consistent with a preferential attachment mechanism together with a Weibull- 
shaped aging term. Such attachment kernel is shared by scientific citation networks, thus indicating 
an universal type of mechanism linking ideas and designs and their evolution. The implications for 
evolutionary theory of innovation are discussed. 

PACS numbers: 87.23.Kg 
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I. INTRODUCTION 

Innovation takes place both in nature and technology 
Either through symbiosis Q, tinkering Q or de- 
sign 0, [I| new functional structures and artifacts are ob- 
tained. Such new entities often result from the combina- 
tion of predefined designs or building blocks, although a 
completely new solution can also emerge. This is the case 
for example of the replacement of vacuum tube technol- 
ogy by semiconductors. However, the majority of techno- 
logical (and evolutionary) changes take place by means of 
a progressive path of change. Such steady and successful 
transformation of designs is largely based on an extensive 
combination and refinement of existing designs. 

A surrogate of the ways in which innovations take place 
in time is provided by patent files. Patents are well- 
defined objects introducing a novel design, method or 
solution for a given problem or set of problems. Addi- 
tionally, they indicate what previous novelties have been 
required to build the new one. In order to gain insight 
into the global organization of the patterns of innovation 
and their evolution in technology, here we study a very 
large data base including all USPTO patents from 1975 
to 2005 @]. 

As it occurs with the fossil record for evolution, the 
record of patents through time provides us with the op- 
portunity of seeing how new inventions emerge and how 
they relate to previous ones. A given patent will typi- 
cally require new solutions and previously achieved re- 
sults. Looking at how patents link to each other is the 
simplest way of having a large scale picture of the pat- 
terns and processes associated to the collective dynam- 
ics of innovation unfolds @, [1] ■ Many interesting ques- 
tions can be formulated in relation to this: what is the 
global organization of interactions among innovations? 
Is this a repeatable pattern? How are similar classes of 
innovations related among them? Do these patterns re- 
spond to history-dependent rules or are instead describ- 
able by means of simple models? These questions are 
addressed here and it is shown that a standard statisti- 




FIG. 1: From (a) to (f), evolution of a patent subset re- 
lated to computed tomography. The hub in the center corre- 
sponds to the precursor invention by G. Hounsfield (US patent 
3778614). 



cal physics approach provides a good picture of how these 
webs emerge. 

The paper is organized as follows: in section II the 
data set analysed is presented. In section III the topo- 
logical trends exhibited by the patent citation network 
are discussed under the light of a model of graph growth 
with aging (section IV). In section V our basic results are 
summarized and its implications outlined. 



II. PATENT CITATION NETWORKS 

Previous studies have measured the value of an inno- 
vation by means of the analysis of patent citations, i.e., 
the rate of receiving new citations. However, innovation 
is an elusive notion that is difficult to measure properly 
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and existing measures provide limited insight [7j. It is a 
difficult task to find useful indicators for the value of in- 
novations. In this context, we introduce patent citation 
networks as an appropriate approach to the global anal- 
ysis of the process of technological innovation. Recent 
work in complex networks provides several models that 
describe or reproduce structural features of natural and 
artificial evolving systems. Here, we will show how inno- 
vation can be described as a process of network growth 
following some specific rules. In particular, our model 
provides a rigorous statistical test to assess the balance 
between patent importance and patent age, i.e., Price's 
"immediacy factor" Q. 

The set of patents and their citations describes a (so- 
called) patent citation network G. The patent network 
belongs to the general class of citation networks, which 
includes the scientific citation network. Here, nodes 
vi G G represent individual patents and the directed link 
Vj) indicates that patent Vi is a descendent of patent 
Vj . In order to illustrate the power of the network ap- 
proach, we have re-analyzed the evolution of a well-know 
patent dataset. Figure 1 shows the time evolution for 
the subset of patents in Computer Tomography (CT), 
from 1973 to 2004. A smaller subset of this dataset was 
analysed in [l3[. The figure indicates that some patents 
receive much more citations than others. In particular, 
the hub at the center corresponds to the very first patent 
in CT associated with its invention by G. Hounsfield. 

Interestingly, the network analysis reveals some other 
patterns that cannot be easily recovered by other means. 
For instance, in figure 1 we can appreciate the modu- 
lar organization of the CT patents. Here we have used 
Clauset et al. algorithm [lj] to detect community struc- 
ture in large networks. Roughly speaking, topological 
modules are defined as groups of nodes having more 
links among them than with other elements in the graph. 
Thus, patents belonging to the same module share a com- 
mon color. Although we have not explored this problem 
in detail, direct inspection of the networks shown in figure 
1 reveals that the modular structure seems to correlate 
well with shared functional traits. As an example, the 
white module involves several related patents associated 
with X-ray tomography. 

Beyond specific patterns of patent evolution, here we 
aim to detect universal trends in the global evolution 
of the US patent system. The patent citation network 
(PCN) analyzed here has N = 2801167 nodes and L = 
18053661 links. Its time evolution from 1976 to 2005 is 
shown in figure [2] The number of patents at a given time 
t scales as a power law: 

N{t)~t 6 (1) 

with an exponent 9 — 1.45 ± 0.06. Some recent papers 
have explored the patent citation datasets at different 
levels, including a graph theoretical approach on a large 
scale [9] or involving a more specific case study, such as 
fuel cell research [lOj . Here we will show that the statisti- 
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FIG. 2: Time evolution of the number of patents N(t) in 
the USPTO dataset from 1973 to 2004. Inset: Cumulative 
number of patents on a log-log scale, showing a scaling N(t) ~ 



cal features of this network can be explained by using an 
appropriate attachment kernel describing how successful 
patents become more linked and how this preferential at- 
tachment decays with age. 



III. DISTRIBUTION OF PATENT CITATIONS 

Citations are often interpreted as indicators of inno- 
vation size or economic value [? ]. The distribution of 
innovation size (defined as the number of citations to a 
patent) is skewed [1, [ill, EH- However, there is an on- 
going discussion about the particular nature of this dis- 
tribution. In particular, there is no general agreement 
whether it follows a log-normal or Pareto distribution 
[l2T [l5| . Still, there are common patterns like the ex- 
istence of some extreme values, which is consistent with 
a power-law tail. We report similar features in the in- 
degree distribution studied here (see below). 

The in-degree distribution Pi(k) is equivalent to the 
so-called distribution of number of patent citations. Fig- 
ure [5JA_ shows the in-degree distribution for the patent 
citation network in 2004. Notice that Pi(k) is neither 
exponential nor a simple power law. Instead, we have 
found that an extended power-law form fits the in-degree 
distribution very well: 

Pi{k) ~ (fc + ^r 7 (2) 

where k = 19.46 ± 0.22 and 7 = 4.55 ± 0.04. This ex- 
tended power-law reduces to a power-law when k ko 
and it degenerates to an exponential distribution for 
k <C kg. The extended power-law distribution has been 
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related to a mixed attachment mechanism 18(. How- 
ever, here we will show that this explanation does not 
apply for the patent citation network. Instead, we pro- 
pose that the extended power-law form for the in-degree 
distribution stems from a combination of both preferen- 
tial attachement and aging [l9j ]. 



The evolution of complex networks involving both pref- 
erential attachment and aging has been extensively stud- 
ied. In particular, Dorogovtsev and Mendes (DM) deter- 
mined analytically the scaling properties of the resulting 
networks [l9j]. In the DM model, the rule of attachment 
scales now as: 




FIG. 3: (A) The in-degree distribution for the patent ci- 
tation network follows an extended power-law distribution, 
Pi(k) ~ (fe+feo) Three distributions are displayed for three 
different time windows, namely 1984 (leftmost), 1992 (center) 
and 2002 (rightmost). (B) The in-degree distribution for the 
subset of patents displayed in fig.l f (for computer tomog- 
raphy) is roughly approximated by a scale free distribution. 
The leftmost point indicates the central hub in fig.l. 



IV. EVOLUTION 



n(fc,r) ~ kr~ 



(4) 



where r = t — i indicates the age of the i — th node and 
the exponent a (which is positive) weights how fast is the 
aging affecting the likelihood of attachment. Extensions 
of this attachment probability kernel include accelerated 
growth with II(fc, t) ~ k"r~ a and exponential aging ker- 
nel II(fc, t) ~ kexp(-T a ) [3. 

Finally, some models of scientific citation networks 
take into account the simultaneous evolution of author 
and paper networks [2l| . In these models, the rule of 
attachment behaves as: 



n(&i,T) 



kfr° 



(5) 



when the time-dependent component follows a Weibull 
form. Here, tq controls the rightward extension of the 
Weibull curve. As tq increases, so does the probability 
of citing older papers. On the other hand, small values 
of To indicate strong aging that favors recently published 
patents [2l|. Here we choose the simplest assumption 
(preferential attachment (3—1) and consider the aging 
function in eq. O Consequently, the average connectivity 
of the i — th patent at the time t evolves according to the 
following equation: 



Lets us assume that every patent has a unique iden- 
tifier < i < t. Our model starts at time t = when 
there is only one patent in the network. From this initial 
network, we add a new patent i at every time step that 
references to previous patents. Two main forces drive 
the evolution of the patent citation network. First, it is 
natural to assume that the number of patent citations 
(i.e., incoming links) is a surrogate of its relevance [? ]. 
Useful patents are more likely to receive further citations 
than marginal inventions. Thus, the probability of receiv- 
ing new citations should be proportional to the current 
number of citations. This rule parallels the preferential 
attachment mechanism of network growth [2fJ. Under 
this rule new elements entering the system connect with 
other nodes with a probability Tl(k) that is proportional 
to its degree, i.e., 



dk(i,t) _ mk(i,t)f(t - i) 
dt ~ f*k(u,t)f(t-u)du 

where to is the number of links introduced at each step 
(to = 1 is the DM model). Now we address the follow- 
ing question: is the above equation consistent with the 
patent network evolution? In the following, we will esti- 
mate the form of the attachment kernel (and the corre- 
sponding a, P and tq parameters) for the patent citation 
data. 

First, we consider system size N as our time index 
instead of real time t. This way we avoid any bias due 
to the pattern of non-linear growth (Q] and attach to the 
standard formulation of network models. Then, eq. ([6]) 
becomes: 



H(fc) 



(3) 



However, old patents tend to be less relevant in the con- 
text of recent innovations: attachment rates decay as the 
patent losses value. In particular, patents are released to 
the public domain after some finite period of exploitation. 



dk(i,N) mk(i,N)f(N -i) 



ON 



£k(u,t)f(N -u)du 



(7) 



Using dk/dN = (dk/dt)(dt/dN) and the time- 
dependent scaling N(t) = At 6 , we have: 
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dk(i,N) 
dN 



Ay 



dk(i,t) 

at 



(8) 



Now the whole time interval N is partitioned into 
N / AN time slots comprising the same number AN <C N 
of patents. Here, N « 2.8 million patents corresponding 
to the time interval 1976-2005. The s — th time slot has 
the same number of new AN = 10 5 patents. 





FIG. 5: Estimation of the attachment rule for the patent 
citation network at Ti — 2003. (a) Preferential attachment 
function fits a scaling-law g>(k) ~ fc ,3+1 with (3 = 1. Each 
curve corresponds to nodes having the same age. (b) Fitting 
for the aging function /(t) predicts the Weibull distribution 
described in the text with a ~ —1.45. Each curve corresponds 
to nodes having the same in-degree (fc = 1 for white balls 
and k — 5 for solid balls) . For every curve we have used 
To = Ti - 1. 



FIG. 4: The normalized attachment kernel II(fc, r) ~ g(k)f(r) 
determined numerically for the patent citation network at 
T = 2002. 



To measure the attachment rule H(ki,r) we monitor 
to which old patent new patents link, as a function of 
in-degree ki and age r (recall that here r is measured in 
number of time slots) . We have assumed that attachment 
is the product of a preferential attachment function g(k) 
and an aging function /(r): 



n(fc,r)~#)/(r) 



(9) 



Following [22J, we study the citation process in a rel- 
atively short time frame (a time slot AN). The large 
number of nodes in the system (in the order of 10 6 nodes) 
ensures that we will gather sufficient samples to recover 
the attachment kernel. We divide the evolution of the 
system in three stages: (i) the system before slot To, (ii) 
the system between slots To and T\ — Tq + 1 and (iii) 
the system after T\. When a T\ node joins the system 
we record the age r and the in-degree k of the To node 
to which the new node links. We count all the citations 
made by new nodes between T\ and T\ + 1. The number 
of citations received by nodes To from T\ nodes normal- 
ized by the in-degree frequency P{k) is an approximation 
to the attachment kernel (see fig. U]). 



Using our dataset, we have estimated that g{k) ~ k@ 
and found (3=1, which further validates our assump- 
tion of preferential attachment (see fig. EJ'V). Notice 
that in our fittings we have used the cumulative func- 
tion g>{k) = Jq g(k)dk to reduce the noise level. On 
the other hand, fig. [5)3 shows the Weibull distribution, 
which fits very well the aging function /(t): 



f(r) 



(10) 



with an exponent a ~ —1.45 and To ~ 40. An obvious 
advantage of using the Weibull form is that in naturally 
includes as limit cases both exponential and Gaussian 
distributions. 

The common structure of the aging term found here 
and in the network of paper citations 2l| suggests that 
common patterns of organization and evolution might 
be shared. The paper citation graph, obtained by look- 
ing at the list of references included in each paper, is in 
fact close to the basic rules defining the patent citation 
graph. In both cases, cross-links are associated to some 
underlying set of features which are shared by patents or 
papers. As it occurs with the patent case, new papers 
are based on previous ones providing the background re- 
quired to build a new idea. On the other hand, as new 
ideas and concepts develop into well-defined areas, they 
will tend to attach less to more generic or older works. 



5 



Additionally, the observed modular organization which 
might also contribute to deviate from the simple power- 
law attachment assumed in previous theoretical studies. 
What seems clear is that there might be some univer- 
sal trends canalizing the growth of innovation networks, 
whether scientific or technologic. 

V. DISCUSSION 

The patterns of innovation emerging in our society are 
the outcome of an extensive exchange of shared informa- 
tion linked with the capacity of inventors to combine and 
improve previous designs. Even very original inventions 
are not isolated from previous achievements. A patent 
can be identified as an object which needs a minimum 
amount of originality to be considered as truly different 
from previous patents. Moreover, to be obtained, it must 
properly refer to related patents in a fair way. Such con- 
straints make this system specially interesting since we 
can wisely conjecture that it represents the expansion of 
real designs through some underlying technology land- 
scape. These designs can be just small improvements or 
large advances. Our analysis provides a quantitative ap- 
proach to this evolving structure using the approach of 
statistical physics. 

We have shown that the underlying rules of network 
change for our system reveal a mixture of preferential 
attachment favouring a rich-gets-richer mechanism to- 
gether with an aging term weighting the likelihood of cit- 
ing old patents. As the network grows, recent patents will 
tend to cite recent designs (since innovation is likely to 
involve redefining recent inventions) and be less likely to 
link to old patents. The consequence of this, as predicted 
by previous mean field models (refs) is that the expected 
scaling law in the degree distribution associated to pref- 



erential attachment kernels will be modified in significant 
ways. Here we have shown that the network of patents, 
defined by using the indegree as a surrogate of patent rel- 
evance, scales as P{k) ~ (k + ko) 1 with 7 > 4. This is 
not far from previous predicted scaling laws (DM) asso- 
ciated to preferential attachment and power law aging (i. 
e. f(t) ~ t-° which predict P(k) ~ k~~<( a) (with 7 ~ 4 
for a ~ 0.5). However, the humped shape of our aging 
term (as described by the Weibull distribution) makes 
necessary to modify these approximations. 

As a final point in our discussion, it is worth noting 
that we have strong correlations among patents indicat- 
ing a complex organization in modules. As shown by 
the example in figure 1, together with the nonlinearities 
associated with the attachment rules, there is some un- 
derlying community structure in the patent network that 
deserves further exploration. The emergence of modules 
is a natural consequence of the specialized features shared 
by related patents. But it might also reveal the structure 
of the innovation landscape itself: new patents related to 
previous ones can also be understood as improved so- 
lutions that explore the neighborhood of previous solu- 
tions. This view would provide a quantitative picture of 
the topology of technology landscapes [H, [24j • Such an 
evolutionary interpretation in terms of fitness functions 
will be explored elsewhere. 
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